Canberra distance on ranked lists
نویسندگان
چکیده
The Canberra distance is the sum of absolute values of the differences between ranks divided by their sum, thus it is a weighted version of the L1 distance. As a metric on permutation groups, the Canberra distance is a measure of disarray for ranked lists, where rank differences in top positions need to pay higher penalties than movements in the bottom part of the lists. Here we describe the distance by assessing its main statistical properties and we show extensions to partial ranked lists. We conclude providing two examples of use in functional genomics.
منابع مشابه
Algebraic stability indicators for ranked lists in molecular profiling – Supplementary Material – Rev1
1 Permutation groups and distances 1 1.1 Distances 2 2 Equivalence of Borda list with best average position list 2 3 Metrics on partial lists 2 4 Properties of the harmonic numbers 2 5 Properties of the Canberra distance 4 5.1 Proof 5 5.2 The standard deviation indicator 5 6 Normalized indicators 5 7 Feature modules 5 8 Dataset shaving experiment 6 8.
متن کاملAlgebraic stability indicators for ranked lists in molecular profiling
MOTIVATION We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the ...
متن کاملComparing top-k XML lists
Systems that produce ranked lists of results are abundant. For instance, Web search engines return ranked lists of Web pages. There has been work on distance measure for list permutations, like Kendall tau and Spearman’s Footrule, as well as extensions to handle top-k lists, which are more common in practice. In addition to ranking whole objects (e.g., Web pages), there is an increasing number ...
متن کاملAn LSH Index for Computing Kendall's Tau over Top-k Lists
We consider the problem of similarity search within a set of top-k lists under the Kendall’s Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but...
متن کاملA Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection
Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by thei...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009